Method for processing sparse-view computed tomography image using neural network and apparatus therefor

ABSTRACT

A method for processing a sparse-view computed tomography (CT) image using a neural network and an apparatus therefor are provided. The method includes receiving a sparse-view CT data and reconstructing an image for the sparse-view CT data using a neural network of a learning model satisfying a predetermined frame condition.

CROSS-REFERENCE TO RELATED APPLICATIONS

This work was supported by Institute of Information & CommunicationsTechnology Planning & Evaluation (IITP) grant funded by the Koreagovernment (MSIT) [2016-0-00562(R0124-16-0002), Emotional IntelligenceTechnology to Infer Human Emotion and Carry on Dialogue Accordingly].This application claims priority under 35 U.S.C. § 119 to Korean PatentApplication No. 10-2018-0060849 filed on May 29, 2018, in the KoreanIntellectual Property Office, the disclosures of which are incorporatedby reference herein in their entireties.

BACKGROUND

Embodiments of the inventive concept described herein relate to a methodfor processing images using a neural network and an apparatus therefor,and more particularly, relate to a method for reconstructing asparse-view computed tomography (CT) image as a high-quality image usinga neural network for a learning model satisfying a predeterminedcondition and an apparatus therefor.

CT which is an imaging technique for obtaining CT images of objectsobtains X-rays attenuated after transmitting X-rays to objects andreconstructs CT images using the obtained X-rays. Because the CT usesX-rays, the exposure of radiation is emerging as a major issue. Variousresearches have been conducted to solve the above problems. There islow-dose CT for reducing the intensity of X-rays, interior tomographyfor irradiating X-rays to only local areas to generate CT images, or thelike. Furthermore, there is sparse-view CT for reducing the number ofphotographed X-rays as the method for reducing the X-ray dose.

The sparse-view CT is a method that lowers the radiation dose byreducing the number of projection views. While the sparse view CT maynot be useful for existing multi-detector CT (MDCT) due to the fast andcontinuous acquisition of projection views, there are many newapplications of sparse-view CT such as spectral CT using alternating kVpswitching, dynamic beam blocker, and the like. Moreover, in C-arm CT ordental CT applications, the scan time is limited primarily by therelative slow speed of the plat-panel detector, rather than themechanical gantry speed, so sparse-view CT gives an opportunity toreduce the scan time.

However, insufficient projection views in sparse-view CT produces severestreaking artifacts in filtered-backprojection (FBP) reconstruction. Toaddress this, conventional technologies have investigated compressedsensing approaches that minimize the total variation (TV) or othersparsity-inducing penalties under a data fidelity term. These approachesare, however, computationally expensive due to the repeated applicationsof projection and back-projection during iterative update steps.

Recently, deep learning approaches have achieved tremendous success invarious fields, such as classification, segmentation, denoising, superresolution. In CT applications, the previous approach provided thesystematic study of deep convolutional neural network (CNN) for low-doseCT and showed that a deep CNN using directional wavelets is moreefficient in removing low-dose related CT noises. This work was followedby many novel extensions for low-dose CT. Unlike these low-doseartifacts from reduced tube currents, the streaking artifacts originatedfrom sparse projection views show globalized artifacts that aredifficult to remove using conventional denoising CNNs. To address thisproblem, previous technologies proposed residual learning networks usingU-Net. Because the streaking artifacts are globally distributed, CNNarchitecture with large receptive field was shown essential in theseworks, and their performance was significantly better than the existingapproaches.

SUMMARY

Embodiments of the inventive concept provide a method for reconstructinga sparse-view CT image as a high-quality image using a neural networkfor a learning model satisfying a predetermined frame condition and anapparatus therefor.

According to an exemplary embodiment, an image processing method mayinclude receiving a sparse-view computed tomography (CT) data andreconstructing an image for the sparse-view CT data using a neuralnetwork of a learning model satisfying a predetermined frame condition.

The reconstructing of the image may include reconstructing the image forthe sparse-view CT data using the neural network of the learning modelwhich satisfies the frame condition and is learned by residual learning.

The neural network may include a neural network which generates thelearning model satisfying the frame condition through a mathematicalanalysis based on convolutional framelets and is learned by the learningmodel.

The neural network may include a multi-resolution neural networkincluding pooling and unpooling layers.

The neural network may include a structured tight frame neural networkby decomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame.

The neural network may include a by-pass connection from the poolinglayer to the unpooling layer.

According to an exemplary embodiment, an image processing method mayinclude receiving a sparse-view CT data and reconstructing an image forthe sparse-view CT data using a neural network for a learning modelwhich satisfies a predetermined frame condition and is based onconvolutional framelets.

The neural network may include a multi-resolution neural networkincluding pooling and unpooling layers.

The neural network may include a structured tight frame neural networkby decomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame.

According to an exemplary embodiment, an image processing device mayinclude a reception unit configured to receive a sparse-view CT data anda reconstruction unit configured to reconstruct an image for thesparse-view CT data using a neural network of a learning modelsatisfying a predetermined frame condition.

The reconstruction unit may be configured to reconstruct the image forthe sparse-view CT data using the neural network of the learning modelwhich satisfies the frame condition and is learned by residual learning.

The neural network may include a neural network which generates thelearning model satisfying the frame condition through a mathematicalanalysis based on convolutional framelets and is learned by the learningmodel.

The neural network may include a multi-resolution neural networkincluding pooling and unpooling layers.

The neural network may include a structured tight frame neural networkby decomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame.

The neural network may include a by-pass connection from the poolinglayer to the unpooling layer.

BRIEF DESCRIPTION OF THE FIGURES

The above and other objects and features will become apparent from thefollowing description with reference to the following figures, whereinlike reference numerals refer to like parts throughout the variousfigures unless otherwise specified, and wherein:

(a) and (b) of FIG. 1 are drawings illustrating an example of CTstreaking artifact patterns in the reconstruction images from 48projection views;

(a) and (b) of FIG. 2 are drawings illustrating an example of comparingsizes of receptive fields according to a structure of a network or aneural network;

FIG. 3 is an operational flowchart illustrating an image processingmethod according to an embodiment of the inventive concept;

(a), (b), and (c) of FIG. 4 are drawings illustrating a simplified U-Netarchitecture, a dual frame U-Net architecture, and a tight frame U-Netarchitecture;

FIGS. 5A, 5B, and 5C are drawings illustrating a standard U-Netarchitecture, a dual frame U-Net architecture, and a tight frame U-Netarchitecture;

(a), (b), and (c) of FIG. 6 are drawings illustrating an example ofreconstruction results by general, dual frame, and tight frame U-Nets atvarious sparse view reconstruction; and

FIG. 7 is a block diagram illustrating a configuration of an imageprocessing device according to an embodiment of the inventive concept.

DETAILED DESCRIPTION

Advantages, features, and methods of accomplishing the same will becomeapparent with reference to embodiments described in detail belowtogether with the accompanying drawings. However, the inventive conceptis not limited by embodiments disclosed hereinafter, and may beimplemented in various forms. Rather, these embodiments are provided toso that this disclosure will be through and complete and will fullyconvey the concept of the invention to those skilled in the art, and theinventive concept will only be defined by the appended claims.

Terms used in the specification are used to describe embodiments of theinventive concept and are not intended to limit the scope of theinventive concept. In the specification, the terms of a singular formmay include plural forms unless otherwise specified. The expressions“comprise” and/or “comprising” used herein indicate existence of one ormore other components, steps, operations, and/or elements other thanstated, components, steps, operations, and/or elements but do notexclude presence of additional elements.

Unless otherwise defined herein, all terms (including technical andscientific terms) used in the specification may have the same meaningthat is generally understood by a person skilled in the art. Also, termswhich are defined in a dictionary and commonly used should beinterpreted as not in an idealized or overly formal detect unlessexpressly so defined.

Hereinafter, a description will be given in detail of exemplaryembodiments of the inventive concept with reference to the accompanyingdrawings. Like reference numerals are used for the same components shownin each drawing, and a duplicated description of the same componentswill be omitted.

The convolutional framelet may be represented for the input signal fusing the local basis ψ_(j) and the non-local basis ϕ_(i) and may berepresented as Equation 1 below.

$\begin{matrix}{f = {\frac{1}{d}{\sum\limits_{i = 1}^{n}{\sum\limits_{j = 1}^{q}{\left\langle {f,{\phi_{i}\mspace{11mu}\psi_{j}}} \right\rangle{\overset{\sim}{\phi}}_{i}\mspace{11mu}{\overset{\sim}{\psi}}_{j}}}}}} & \left\lbrack {{Equation}\mspace{14mu} 1} \right\rbrack\end{matrix}$

Herein, ϕ_(i) denotes the linear transform operator with the non-localbasis vector, and ψ_(j) denotes the linear transform operator with thelocal basis vector.

In this case, the local basis vector and the non-local basis vector mayhave the dual basis vectors {tilde over (ϕ)}_(i) and {tilde over(ψ)}_(j), respectively, which are orthogonal to each other. Theorthogonal relationship between the basis vectors may be defined asEquation 2 below.

$\begin{matrix}{{{\overset{\sim}{\Phi}\Phi^{\top}} = {{\sum\limits_{i = 1}^{m}{{\overset{\sim}{\phi}}_{i}\phi_{i}^{\top}}} = I_{n \times n}}},{{\Psi\;{\overset{\sim}{\Psi}}^{\top}} = {{\sum\limits_{j = 1}^{q}{\psi_{j}{\overset{\sim}{\psi}}_{j}^{\top}}} = I_{d \times d}}}} & \left\lbrack {{Equation}\mspace{14mu} 2} \right\rbrack\end{matrix}$

Using Equation 2 above, the convolutional framelet may be represented asEquation 3 below.

_(d)(f)={tilde over (Φ)}Φ^(T)

_(d)(d)Ψ{tilde over (Ψ)}^(T) ={tilde over (Φ)}C{tilde over (Ψ)} ^(T)C=Φ ^(T)

_(d)(f)Ψ=Φ^(T)(f

Ψ)  [Equation 3]

Herein,

_(d) denotes the Hankel matrix operator, which may allow theconvolutional operation to be represented as the matrix multiplication,and C denotes the convolutional framelet coefficient which is the signaltransformed by the local basis and the non-local basis.

The convolutional framelet coefficient C may be reconstructed as theoriginal signal by applying the dual basis vectors {tilde over (ϕ)}_(i),{tilde over (ψ)}_(j). The reconstruction process may be represented asEquation 4 below.f=({tilde over (Φ)}c)

τ({tilde over (Ψ)})  [Equation 4]

As such, the technique of representing the input signal through thelocal basis and the non-local basis may be the convolutional framelet.

One of the key ingredients for the deep convolutional framelets may bethe frame condition for the non-local basis. However, the existingneural network architecture, for example, the U-Net architecture, doesnot satisfy the frame condition and it overly emphasizes the lowfrequency component of the signal. In sparse-view CT, this artifact maybe manifested as blurring artifacts in the reconstructed images.

An embodiment of the inventive concept may provide two types of novelnetwork architectures that satisfy the frame condition and may provide adual frame network and a tight frame network.

Herein, the dual frame network may be a by-pass connection in thelow-resolution path to generate a residual signal. The tight framenetwork with orthogonal wavelet basis, for example, Haar wavelet basismay be implemented by adding the high frequency path to the existingU-Net structure.

Mathematical Preliminaries

Notations

For a matrix A, R(A) denotes the range space of A, and P_(R(A)) denotesthe projection to the range space of A. The identity matrix is referredto as I. For a given matrix A, the notation A^(t) refers to thegeneralized inverse matrix. The superscript T of A^(T) denotes theHermitian transpose. When a matrix Ψ∈

^(pd×q) is partitioned as Ψ=[Ψ₁ ^(T) . . . Ψ_(p) ^(T)]^(T) with asubmatrix Ψ_(i)∈

^(d×q), then ψ_(j) ^(i) refers to the j-th column of Ψ_(i). A vector υ∈

^(n) refers to the flipped version of a vector υ∈

^(n), i.e. its indices are reversed. Similarly, for a given matrix Ψ∈

^(d×q), the notation Ψ ∈

^(d×q) refers to flipped vectors, i.e., Ψ=[ψ ₁ . . . ψ _(q)]. For ablock structured matrix Ψ∈

^(pd×q), with a slight abuse of notation, an embodiment of the inventiveconcept may define Ψ as Equation 5 below.

$\begin{matrix}{{\overset{\_}{\Psi} = \begin{bmatrix}{\overset{\_}{\Psi}}_{1} \\\vdots \\{\overset{\_}{\Psi}}_{p}\end{bmatrix}},\;{{{where}\mspace{14mu}{\overset{\_}{\Psi}}_{i}} = {\left\lfloor {\overset{\_}{\psi_{1}^{i}}\mspace{14mu}\ldots\mspace{11mu}\overset{\_}{\psi_{q}^{i}}} \right\rbrack \in {\mathbb{R}}^{d \times q}}}} & \left\lbrack {{Equation}\mspace{14mu} 5} \right\rbrack\end{matrix}$

Frame

A family of functions {ϕ_(k)}_(k∈Γ) in a Hilbert space H is called aframe when it satisfies the following inequality of Equation 6 below.

$\begin{matrix}{{{\alpha{f}^{2}} \leq {\sum\limits_{k \in \Gamma}{\left\langle {f,\phi_{k}} \right\rangle }^{2}} \leq {\beta{f}^{2}}},{\forall{f \in H}}} & \left\lbrack {{Equation}\mspace{14mu} 6} \right\rbrack\end{matrix}$

Here, α, β>0 are called the frame bounds. When α=β, then the frame issaid to be tight.

A frame is associated with a frame operator Φ composed of ϕ_(k): Φ=[ . .. ϕ^(k−1) ϕ_(k) . . . ]. Then, Equation 6 above may be equivalentlywritten by Equation 7 below.α∥f∥ ²≤Φ^(T) f∥ ² ≤β∥f∥ ² , ∀f∈H  [Equation 7]

The frame bounds may be represented by Equation 8 below.α=σ_(min)(ΦΦ^(T)), β=σ_(max)(ΦΦ^(T))  [Equation 8]

Here, σ_(min)(A) and σ_(max)(A) denote the minimum and maximum singularvalues of A, respectively.

When the frame lower bound α is non-zero, because {circumflex over(f)}={tilde over (Φ)}c={tilde over (Φ)}Φ^(T)f=f, then the recovery ofthe original signal may be done from the frame coefficient c=Φ^(T)fusing the dual frame {tilde over (Φ)} satisfying the frame conditionrepresented in Equation 9 below.{tilde over (Φ)}Φ^(T) =I  [Equation 9]

The explicit form of the dual frame may be given by the pseudo-inverseas Equation 10 below.{tilde over (Φ)}=(ΦΦ^(T))⁻¹Φ  [Equation 10]

If the frame coefficients are contaminated by the noise w, i.e.,c=Φ^(T)f+w, then the recovered signal using the dual frame is given by{circumflex over (f)}={tilde over (Φ)}c={tilde over(Φ)}(Φ^(T)f+w)=f+{tilde over (Φ)}w. Therefore, the noise amplificationfactor may be computed by Equation 11 below.

$\begin{matrix}{\frac{{{\overset{\sim}{\Phi}w}}^{2}}{{w}^{2}} = {\frac{\sigma_{\max}\left( {\Phi\Phi}^{\top} \right)}{\sigma_{m\; i\; n}\left( {\Phi\Phi}^{\top} \right)} = {\frac{\beta}{\alpha} = {\kappa\left( {\Phi\Phi}^{T} \right)}}}} & \left\lbrack {{Equation}\mspace{14mu} 11} \right\rbrack\end{matrix}$

Here, κ(⋅) refers to the condition number.

A tight frame has the minimum noise amplification factor β/α=1, and itis equivalent to the condition as Equation 12 below.Φ^(T) Φ=cI, c>0  [Equation 12]

Hankel Matrix

Since the Hankel matrix is an essential component in the theory (K.Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussiandenoiser: Residual learning of deep CNN for image denoising,” arXivpreprint arXiv:1608.03981, 2016) of deep convolutional framelets, anembodiment of the inventive concept briefly reviews it.

To avoid special treatment of boundary condition, an embodiment of theinventive concept may be mainly derived using the circular convolution.For simplicity, an embodiment of the inventive concept may consider 1-Dsignal processing, but the extension to 2-D signal processing may bestraightforward.

Let f=[f[1], . . . , f[n]]^(T)∈

^(n) be the signal vector, a wrap-around Hankel matrix

_(d)(f) may be defined by Equation 13 below.

$\begin{matrix}{{{\mathbb{H}}_{d}(f)} = \begin{bmatrix}{f\lbrack 1\rbrack} & {f\lbrack 2\rbrack} & \ldots & {f\lbrack d\rbrack} \\{f\lbrack 2\rbrack} & {f\lbrack 3\rbrack} & \ldots & {f\left\lbrack {d + 1} \right\rbrack} \\\vdots & \vdots & \ddots & \vdots \\{f\lbrack n\rbrack} & {f\lbrack 1\rbrack} & \ldots & {f\left\lbrack {d - 1} \right\rbrack}\end{bmatrix}} & \left\lbrack {{Equation}\mspace{14mu} 13} \right\rbrack\end{matrix}$

Here, d denotes the matrix pencil parameter.

When a multi-channel signal is given as Equation 14 below, an extendedHankel matrix may be constructed by stacking Hankel matrices. Theextended Hankel matrix may be represented as Equation 15 below.F:=[f ₁ . . . f _(p)]∈

^(n×p)  [Equation 14]

_(d|p)(F):=[

_(d)(f ₁)

_(d)(f ₂) . . .

_(d)(f _(p))]  [Equation 15]

Here, the Hankel matrix is closely related to the convolution operationsin CNN. Specifically, for a given convolutional filter ψ=[ψ[d], . . . ,ψ[1]]^(T)∈

^(d), a single-input single-output (SISO) convolution in CNN may berepresented as Equation 16 below using a Hankel matrix.y=f

ψ=

_(d)(f)ψ∈

^(n)  [Equation 16]

Similarly, a single-input multi-output (SIMO) convolution using CNNfilter kernel Ψ=[ψ₁ . . . , ψ_(q)]∈

^(d×q) may be represented by Equation 17 below.Y=f

ψ=

_(d)(f)Ψ∈

^(n×q)  [Equation 17]

Here, q denotes the number of output channels.

A multi-input multi-output (MIMO) convolution in CNN may be representedby Equation 18 below.

$\begin{matrix}{Y = {{F\Psi} = {{{\mathbb{H}}_{d❘p}(F)}\begin{bmatrix}\Psi_{1} \\\vdots \\\Psi_{p}\end{bmatrix}}}} & \left\lbrack {{Equation}\mspace{14mu} 18} \right\rbrack\end{matrix}$

Here, p and q refer to the number of input and output channels,respectively.

The j-th input channel filter may be represented as Equation 19 below.Ψ_(j)=[ψ₁ ^(j) . . . ψ_(q) ^(j)]∈

^(d×q)  [Equation 19]

The extension to the multi-channel 2-D convolution operation for animage domain CNN may be straightforward, since similar matrix vectoroperations may also be used. That is, the extension to the multi-channel2-D convolution operation for the image domain CNN may be changed inonly the definition of the Hankel matrices, which is defined as blockHankel matrix, or the extended Hankel matrices.

One of the most intriguing properties of the Hankel matrix is that itoften has a low-rank structure and its low-rankness is related to thesparsity in the Fourier domain. This property is extremely useful, asevidenced by their applications for many inverse problems and low-levelcomputer vision problems.

Deep Convolutional Framelets

An embodiment of the inventive concept may briefly review the theory ofdeep convolutional framelets. Using the existing Hankel matrixapproaches, an embodiment of the inventive concept may consider thefollowing regression problem as Equation 20 below.

$\begin{matrix}{{\min\limits_{f \in {\mathbb{R}}^{n}}{{f^{*} - f}}^{2}}{{{subject}\mspace{14mu}{to}\mspace{14mu}{RANK}{\;\;}{{\mathbb{H}}_{d}(f)}} \leq r < d}} & \left\lbrack {{Equation}\mspace{14mu} 20} \right\rbrack\end{matrix}$

Here, f*∈

^(d) denotes the ground-truth signal.

The classical approach to address the problem of Equation 20 above is touse singular value shrinkage or matrix factorization. However, in deepconvolutional framelets, the problem is addressed using learning-basedsignal representation.

More specifically, for any feasible solution f for Equation 20 above,its Hankel structured matrix

_(d)(f) has the singular value decomposition

_(d)(f)=UΣV^(T), where U=[u₁ . . . u_(r)]∈

^(n×r) and V=[υ₁ . . . υ_(r)]∈

^(d×r) denote the left and right singular vector basis matrices,respectively. Σ=(σ_(ij))∈

^(r×r) is the diagonal matrix with singular values. An embodiment of theinventive concept may consider the matrix pairs Φ, {tilde over (Φ)}∈

^(n×n) satisfying the frame condition represented in Equation 21 below.{tilde over (Φ)}Φ^(T) =I  [Equation 21]

These bases are referred to as non-local bases since they interact withall the n-elements of f∈

^(n) by multiplying them to the left of

_(d)(f)∈

^(n×d). An embodiment of the inventive concept may need another matrixpair Ψ, {tilde over (Ψ)}∈

^(d×r) satisfying the low dimensional subspace constraint represented inEquation 22 below.Ψ{tilde over (Ψ)}^(T) =P _(R(V))  [Equation 22]

These may be called local bases because they only interact withd-neighborhood of the signal f∈

^(n). Using Equations 21 and 22 above, an embodiment of the inventiveconcept may obtain the Hankel matrix reconstruction operator for theinput signal f as Equation 23 below.

_(d)(f)={tilde over (Φ)}Φ^(T)

_(d)(f)Ψ{tilde over (Ψ)}^(T)  [Equation 23]

Here,

_(d) denotes the Hankel matrix reconstruction operator.

Factorizing Φ^(T)

_(d)(f)Ψ from Equation 23 above results in the decomposition of f usinga single layer encoder-decoder architecture as Equation 24 below.f=({tilde over (Φ)}C)

ν({tilde over (Ψ)}), C=Φ ^(T)(f

Ψ)  [Equation 24]

Here, the encoder and decoder convolution filters are given by Equation25 below.

$\begin{matrix}{{\overset{\_}{\Psi}:={\left\lbrack {{\overset{\_}{\psi}}_{1}\mspace{14mu}\ldots\mspace{14mu}{\overset{\_}{\psi}}_{q}} \right\rbrack \in {\mathbb{R}}^{d \times q}}},{{v\left( \overset{\sim}{\Psi} \right)}:={{\frac{1}{d}\begin{bmatrix}{\overset{\sim}{\psi}}_{1} \\\vdots \\{\overset{\sim}{\psi}}_{q}\end{bmatrix}} \in {\mathbb{R}}^{dq}}}} & \left\lbrack {{Equation}\mspace{14mu} 25} \right\rbrack\end{matrix}$

Equation 24 above is the general form of the signals that are associatedwith a rank-r Hankel structured matrix, and an embodiment of theinventive concept is interested in specifying bases for optimalperformance. In the deep convolutional framelets, Φ and {tilde over (Φ)}may correspond to the user-defined generalized pooling and unpooling tosatisfy the frame condition of Equation 21 above. On the other hand, thefilters Ψ, {tilde over (Ψ)} need to be estimated from the data. To limitthe search space for the filters, an embodiment of the inventive conceptmay consider

₀, which consists of signals that have positive framelet coefficients.

₀ that consists of the signals that have the positive frameletcoefficients may be represented as Equation 16 below.

₀ ={f∈

^(n) |f=({tilde over (Φ)}C)

ν({tilde over (Ψ)})C=Φ ^(T)(f

Ψ)≥0}  [Equation 26]

The main goal of the neural network training is to learn (Ψ, {tilde over(Ψ)}) from training data {(f_((i)), f*_((i)))}_(i=1) ^(N) assuming that{f*_((i))} is associated with rank-r Hankel matrices. More specifically,the regression problem in an embodiment of the inventive concept for thetraining data under rank-r Hankel matrix constraint in Equation 20 abovemay be given by Equation 27 below.

min { f ( i ) } ∈ 0 ⁢ ∑ i = 1 N ⁢  f ( i ) * - f ( i )  2 [ Equation ⁢ ⁢27 ]

Equation 27 above may be represented as Equation 28 below.

$\begin{matrix}{\min\limits_{\{{\Psi,\overset{\sim}{\Psi}}\}}{\sum\limits_{i = 1}^{N}{{f_{(i)}^{*} - {Q\left( {{f_{(i)};\Psi},\overset{\sim}{\Psi}} \right)}}}^{2}}} & \left\lbrack {{Equation}\mspace{14mu} 28} \right\rbrack\end{matrix}$

Here, Q may be represented as Equation 29 below.

(f _((i));Ψ,{tilde over (Ψ)})=({tilde over (Φ)}C[f _((i))])

ν({tilde over (Ψ)})  [Equation 29]

Here, C may be represented as Equation 30 below.C[f _((i))]=ρ(Φ^(T)(f _((i))

Ψ))  [Equation 30]

Here, ρ(⋅) refers to the rectified linear unit (ReLU) to impose thepositivity for the framelet coefficients.

After the network is fully trained, the inference for a given noisyinput f is simply done by

(f; Ψ, {tilde over (Ψ)}), which is equivalent to find a denoisedsolution that has the rank-r Hankel structured matrix.

In the sparse-view CT problems, it was consistently shown that theresidual learning with a by-pass connection is better than direct imagelearning. To investigate this phenomenon systematically, assuming thatthe input image f_((i)) of sparse-view CT is contaminated with streakingartifacts, the input image f_((i)) of the sparse-view CT may berepresented as Equation 31 below.f _((i)) =f* _((i)) +h _((i))  [Equation 31]

Here, h_((i)) denotes the streaking artifacts and f*_((i)) refers to theartifact-free ground-truth image.

Then, instead of using the cost function, the residual network trainingmay be formulated as Equation 32 below.

$\begin{matrix}{\min\limits_{\{{\Psi,\overset{\sim}{\Psi}}\}}{\sum\limits_{i = 1}^{N}{{h_{(i)} - {Q\left( {{{f_{(i)}^{*} + h_{(i)}};\Psi},\overset{\sim}{\Psi}} \right)}}}^{2}}} & \left\lbrack {{Equation}\mspace{14mu} 32} \right\rbrack\end{matrix}$

Here, the residual learning scheme is to find the filter Ψ whichapproximately annihilates the true signal f*_((i)) as Equation 33 below.f* _((i))

Ψ≃0  [Equation 33]

The signal decomposition using deep convolutional framelets may beapplied for the streaking artifact signal as Equation 34 below.

$\begin{matrix}\left( {{{\overset{\sim}{\Phi}{C\left\lbrack {f_{(i)}^{*} + h_{(i)}} \right\rbrack}{v\left( \overset{\sim}{\Psi} \right)}} \simeq {\left( {\overset{\sim}{\Phi}{C\left\lbrack h_{(i)} \right\rbrack}} \right){v\left( \overset{\_}{\Psi} \right)}}} = h_{(i)}} \right. & \left\lbrack {{Equation}\mspace{14mu} 34} \right\rbrack\end{matrix}$

Here, the first approximation may come from Equation 35 below thanks tothe annihilating property of Equation 29 above.

$\begin{matrix}{{C\left\lbrack {f_{(i)}^{*} + h_{(i)}} \right\rbrack} = {{\Phi^{\top}\left( {\left( {f_{(i)}^{*} + h_{(i)}} \right)\overset{\_}{\Psi}} \right)} \simeq {C\left\lbrack h_{(i)} \right\rbrack}}} & \left\lbrack {{Equation}\mspace{14mu} 35} \right\rbrack\end{matrix}$

Accordingly, the neural network is trained to learn the structure of thetrue image to annihilate the artifact signals, but still to retain theartifact signals.

The above-mentioned details may be extended to the multi-layer deepconvolutional framelet. More specifically, for the L-layerdecomposition, the space

₀ in Equation 26 above may be recursively defined as Equation 36 below.

0 = { f ∈ ℝ n ❘ f = ( Φ ~ ⁢ C ) ⁢ ⁢ v ⁡ ( Ψ ~ ) , ⁢ C = Φ ⊤ ⁢ ( f ⁢ ⁢ Ψ _ ) ≥ 0, C ∈ 1 } [ Equation ⁢ ⁢ 36 ]

Here,

_(l), l=1, . . . , L−1 may be defined as Equation 37 below.

_(l) ={Z∈

^(n×p) ^((l)) |Z=({tilde over (Φ)}C ^((l)))

ν({tilde over (Ψ)}^((l))),C ^((l))=Φ^(T)(Z

Ψ ^((l)))≥0,C ^((l))∈

_(l+1)}

_(L)=

^(n×p) ^((L))   [Equation 37]

In Equation 37 above, the l-th layer encoder and decoder filters may bedefined by Equations 38 and 39 below.

$\begin{matrix}{{\overset{\_}{\Psi}}^{(l)}:={\begin{bmatrix}{\overset{\_}{\psi}}_{1}^{1} & \ldots & {\overset{\_}{\psi}}_{q}^{1} \\\vdots & \ddots & \vdots \\{\overset{\_}{\psi}}_{1}^{p{(l)}} & \ldots & {\overset{\_}{\psi}}_{q{(l)}}^{p{(l)}}\end{bmatrix} \in {\mathbb{R}}^{d_{(l)}p_{(l)} \times q_{(l)}}}} & \left\lbrack {{Equation}\mspace{14mu} 38} \right\rbrack \\{{v\left( {\overset{\sim}{\Psi}}^{(l)} \right)}:={{\frac{1}{d}\begin{bmatrix}{\overset{\sim}{\psi}}_{1}^{1} & \ldots & {\overset{\sim}{\psi}}_{1}^{p{(l)}} \\\vdots & \ddots & \vdots \\{\overset{\sim}{\psi}}_{q{(l)}}^{1} & \ldots & {\overset{\sim}{\psi}}_{q{(l)}}^{p{(l)}}\end{bmatrix}} \in {\mathbb{R}}^{d_{(l)}q_{(l)} \times p_{(l)}}}} & \left\lbrack {{Equation}\mspace{14mu} 39} \right\rbrack\end{matrix}$

Here, d_((l)), p_((l)), q_((l)) denote the filter length, and the numberof input and output channels, respectively.

As described above, by recursively narrowing the search space of theconvolution frames in each layer, an embodiment of the inventive conceptmay obtain the deep convolution framelet extension and the associatedtraining scheme.

In short, the non-local bases Φ^(T) and {tilde over (Φ)} correspond tothe generalized pooling and unpooling operations, while the local basesΨ and {tilde over (Ψ)} work as learnable convolutional filters.Moreover, for the generalized pooling operation, the frame condition maybe the most important prerequisite for enabling the recovery conditionand controllable shrinkage behavior, which is the main criterion forconstructing the U-Net variants.

FIG. 3 is an operational flowchart illustrating an image processingmethod according to an embodiment of the inventive concept.

Referring to FIG. 3, the image processing method according to anembodiment of the inventive concept may include receiving (S310)sparse-view CT data and reconstructing (S320) an image for thesparse-view CT data using a neural network of a learning modelsatisfying a predetermined frame condition.

Herein, operation S320 may be to reconstruct the image the sparse-viewCT data using the neural network of the learning model which satisfiesthe frame condition and is learned by residual learning.

The neural network used in an embodiment of the inventive concept mayinclude a neural network which generates a learning model whichsatisfies the frame condition through a mathematical analysis based onthe convolutional framelets and is learned by the learning model and mayinclude a multi-resolution neural network including pooling andunpooling layers.

Herein, the neural network may include a structured tight frame neuralnetwork by decomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing the mathematicalexpression of the multi-resolution neural network as the dual frame.

In addition, the neural network may include a by-pass connection fromthe pooling layer to the unpooling layer.

A description will be given of the method according to an embodiment ofthe inventive concept with reference to FIGS. 3 to 6.

U-Net for Sparse-View CT and its Limitations

(a) and (b) of FIG. 1 are drawings illustrating an example of CTstreaking artifact patterns in the reconstruction images from 48projection views. (a) and (b) of FIG. 1 show two reconstruction imagesand their artifact-only images when only 48 projection views areavailable.

As shown in (a) and (b) of FIG. 1, there is a significant streakingartifact that emanates from images over the entire image area. Thissuggests that the receptive field of the convolution filter should coverthe entire area of the image to effectively suppress the streakingartifacts.

One of the most important characteristics of multi-resolutionarchitecture like U-Net (O. Ronneberger, P. Fischer, and T. Brox,“U-Net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical Image Computing andComputer-Assisted Intervention. Springer, 2015, pp. 234-241.) is theexponentially large receptive field due to the pooling and unpoolinglayers.

(a) and (b) of FIG. 2 are drawings illustrating an example of comparingsizes of receptive fields according to a structure of a network or aneural network. (a) and (b) of FIG. 2 compare the size of the receptivefield of a multi-resolution network U-Net ((b) of FIG. 2) and a singleresolution CNN ((a) of FIG. 2) without pooling layers. As shown in (a)of FIG. 2, for the general neural network, the receptive field becomeslarger and larger as it passes through the convolution layer Cony.However, because the rate of increase is not high, it should have thevery deep neural network architecture to include the entire area.However, when the depth of the neural network becomes deeper and deeper,it may face the problem of gradient vanishing which is poor in learningthe neural network or overfitting may occur due to a large number ofparameters. On the other hand, as shown in (b) of FIG. 2, for U-Net,because the input image decreases in size as the receptive field passesthrough pooling and unpooling layers, the reception field for the inputimage may become relatively larger. Due to this, compared with thegeneral neural network of the same depth, an embodiment of the inventiveconcept may have the larger receptive field. Thus, the neural networkarchitecture that has a large receptive field on the neural network maybe mainly used as the multi-resolution architecture, for example, theU-Net architecture, when performing the segmentation in the image.However, from the theoretical viewpoint, the limit clearly exists in theimage reconstruction using the U-Net architecture. Furthermore, as maybe observed in (a) and (b) of FIG. 2, with the same size convolutionalfilters, the receptive field is enlarged in the network with poolinglayers. Thus, the multi-resolution architecture, such as U-Net, is goodfor the sparse view CT reconstruction to deal with the globallydistributed streaking artifacts.

(a) of FIG. 4 shows a simplified U-Net architecture. As shown in (a) ofFIG. 4, U-Net play a role in delivering the signal of the input unit tothe output unit, by using the average pooling layer and the averageunpooling layer as the non-local bases and through the by-passconnection layer expressed by a dotted line.

The U-Net is recursively applied to the low-resolution signal. Here, theinput f∈

^(n) is first filtered with the local convolutional filter Ψ, which isthen reduced to a half size approximate signal using a pooling operationΦ. Mathematically, this step may be represented by Equation 40 below.C=Φ ^(T)(f

Ψ)=Φ^(T)

_(d)(f)Ψ  [Equation 40]

Here, f

Ψ denotes the multi-channel convolution in CNN. For the case of averagepooling, Φ^(T) denotes a pooling operator, and the pooling operator isgiven by Equation 41 below.

$\begin{matrix}{\Phi^{\top} = {{\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1 & 0 & 0 & \ldots & 0 & 0 \\0 & 0 & 1 & 1 & \ldots & 0 & \; \\\; & \vdots & \; & \; & \ddots & \vdots & \; \\0 & 0 & 0 & 0 & \ldots & 1 & 1\end{bmatrix}} \in {\mathbb{R}}^{\frac{n}{2} \times n}}} & \left\lbrack {{Equation}\mspace{14mu} 41} \right\rbrack\end{matrix}$

As shown in (a) of FIG. 4, the U-Net has the by-pass connection tocompensate for the lost high frequency during pooling. Combining thetwo, the convolutional framelet coefficients may be represented byEquation 42 below.

$\begin{matrix}{C_{ext} = {{\Phi_{ext}^{\top}\left( {f\Psi} \right)} = \begin{bmatrix}B \\S\end{bmatrix}}} & \left\lbrack {{Equation}\mspace{14mu} 42} \right\rbrack\end{matrix}$

Here, Φ_(ext) ^(T) refers to the extended pooling, B refers to thebypass component, and S refers to the low pass subband.

Φ_(ext) ^(T) may be given by Equation 43 below, and B and S may berepresented as Equation 44 below.

$\begin{matrix}{\Phi_{ext}^{\top}:=\begin{bmatrix}I \\\Phi^{\top}\end{bmatrix}} & \left\lbrack {{Equation}\mspace{14mu} 43} \right\rbrack \\{{B = {f\overset{\_}{\Psi}}},\;{S = {\Phi^{\top}\left( {f\overset{\_}{\Psi}} \right)}}} & \left\lbrack {{Equation}\mspace{14mu} 44} \right\rbrack\end{matrix}$

Thus, Equation 45 below may be derived using the above-mentionedequations.Φ_(ext)Φ_(ext) ^(T) =I+ΦΦ ^(T)  [Equation 45]

Here, ΦΦ^(T)=P_(R(Φ)) for the case of average pooling.

Thus, Φ_(ext) does not satisfy the frame condition, which results inartifacts.

In other words, because the signal reconstructed through the neuralnetwork of the general U-Net architecture does not express the originalsignal perfectly and leads to an overemphasis of the low frequencycomponents, the blurred reconstruction signal may be generated.

Dual Frame U-Net

As described above, one simple fix for the aforementioned limitation isusing the dual frame. Specifically, using Equation 10 above, the dualframe for Φ_(ext) in Equation 43 above may be obtained as Equation 46below.{tilde over (Φ)}_(ext)=(Φ_(ext)Φ_(ext) ^(T))⁻¹Φ_(ext)=(I+ΦΦ^(T))⁻¹[IΦ]  [Equation 46]

Here, thanks to the matrix inversion lemma and the orthogonalityΦ^(T)Φ=I for the case of average pooling, an embodiment of the inventiveconcept may obtain Equation 47 below.(I+ΦΦ ^(T))⁻¹ =I−Φ(I+Φ ^(T)Φ)⁻¹Φ^(T) =I−½ΦΦ^(T)  [Equation 47]

Thus, the dual frame may be given by Equation 48 below.{tilde over (Φ)}_(ext)=(I−ΦΦ ^(T)/2)[IΦ]=[I−ΦΦ ^(T)/2Φ/2]  [Equation 48]

For a given framelet coefficient C_(ext) in Equation 42 above, thereconstruction using the dual frame may be given by Equation 49 below.

$\begin{matrix}\begin{matrix}{{\hat{C}}_{ext}:={{{\overset{\sim}{\Phi}}_{ext}C_{ext}} = {{\left( {I - \frac{{\Phi\Phi}^{\top}}{2}} \right)B} + {\frac{1}{2}\Phi\; S}}}} \\{= {B + {\frac{1}{2}\underset{\underset{unpooling}{︸}}{\Phi}\overset{\overset{residual}{︷}}{\left( {S - {\Phi^{\top}B}} \right)}}}}\end{matrix} & \left\lbrack {{Equation}\mspace{14mu} 49} \right\rbrack\end{matrix}$

Constructing the neural network based on Equation 49 above may suggest anetwork structure for the dual frame U-Net. More specifically, unlikethe U-Net, the residual signal at the low resolution may be upsampledthrough the unpooling layer. This may be easily implemented using anadditional by-pass connection for the low-resolution signal as shown in(b) of FIG. 4. This simple fix allows the network to satisfy the framecondition. However, there exists noise amplification likeI+ΦΦ^(T)=I+P_(R(Φ)).

Similar to the U-Net, the final step of dual frame U-Net is theconcatenation and the multi-channel convolution, which is equivalent toapplying the inverse Hankel operation, i.e.,

_(d) ^(t)(⋅), to the processed framelet coefficients multiplied with thelocal basis. Specifically, the concatenated signal may be given byEquation 50 above. The final convolution may be equivalently computed byEquation 50 below.

$\begin{matrix}{W = \left\lbrack {B\frac{1}{2}{\Phi\left( {S - {\Phi^{\top}B}} \right)}} \right\rbrack} & \left\lbrack {{Equation}\mspace{14mu} 50} \right\rbrack \\\begin{matrix}{\hat{f} = {{\mathbb{H}}_{d}^{\dagger}\left( {W\begin{bmatrix}\Xi^{\top} \\ \ominus^{\top}\end{bmatrix}} \right)}} \\{= {{{\mathbb{H}}_{d}^{\dagger}\left( {B\;\Xi^{\top}} \right)} + {\frac{1}{2}{{\mathbb{H}}_{d}^{\dagger}\left( {{\Phi\; S} \ominus^{\top}} \right)}} - {\frac{1}{2}{{\mathbb{H}}_{d}^{\dagger}\left( {{{\Phi\Phi}^{\top}B} \ominus^{\top}} \right)}}}} \\{= {{\mathbb{H}}_{d}^{\dagger}\left( {{{\mathbb{H}}_{d}(f)}\Psi\;\Xi^{\top}} \right)}} \\{= {\frac{1}{d}{\sum\limits_{i = 1}^{q}\left( {f\;\;{\overset{\_}{\psi}}_{i}\xi_{i}} \right)}}}\end{matrix} & \left\lbrack {{Equation}\mspace{14mu} 51} \right\rbrack\end{matrix}$

Here, the third equation in Equation 51 above comes from S=Φ^(T)(F

Ψ)=Φ^(T)B. Therefore, by choosing the local filter basis such that ΨΞ=I,the right hand side of Equation 51 above becomes equal to f, satisfyingthe recovery condition.

Tight Frame U-Net

Another way to improve the performance of U-Net with minimum noiseamplification is using tight filter-bank frames or wavelets.Specifically, the non-local basis Φ^(T) may be composed of filter bankas represented in Equation 52 below.Φ=[T ₁ . . . T _(L)]  [Equation 52]

Here, T_(k) denotes the k-th subband operator.

An embodiment of the inventive concept assumes that the filter bank istight like Equation 53 below, i.e., for some scalar c>0.

$\begin{matrix}{{\Phi\Phi}^{\top} = {{\sum\limits_{k = 1}^{L}{T_{k}T_{k}^{\top}}} = {c\; I}}} & \left\lbrack {{Equation}\mspace{14mu} 53} \right\rbrack\end{matrix}$

The convolutional framelet coefficients including a by-pass connectionmay be written by Equation 54 below.C _(ext):=Φ_(ext) ^(T)(f

Ψ)=[B ^(T) S ₁ ^(T) . . . S _(L) ^(T)]^(T)  [Equation 54]

Here, Φ_(ext):=[I T₁ . . . T_(L)]^(T), B=f

Ψ, S_(k)=T_(k) ^(T)C.

An embodiment of the inventive concept may see that Φ_(ext) is also atight frame by Equation 55 below.

$\begin{matrix}{{\Phi_{ext}\Phi_{ext}^{\top}} = {{I + {\sum\limits_{k = 1}^{L}{T_{k}T_{k}^{\top}}}} = {\left( {c + 1} \right)I}}} & \left\lbrack {{Equation}\mspace{14mu} 55} \right\rbrack\end{matrix}$

There are several important tight filter bank frames. One of thesimplest tight filter bank frames is that Haar wavelet transform withlow-pass subband decomposition and high-pass subband decomposition,where T₁ is the low-pass subband, which is equivalent to the averagepooling in Equation 37 above and T₂ is the high-pass subband filter. Thehigh-pass filtering T₂ may be given by Equation 56 below.

$\begin{matrix}{T_{2} = {\frac{1}{\sqrt{2}}\begin{bmatrix}1 & {- 1} & 0 & 0 & \ldots & 0 & 0 \\0 & 0 & 1 & {- 1} & \ldots & 0 & \; \\\; & \vdots & \; & \; & \ddots & \vdots & \; \\0 & 0 & 0 & 0 & \ldots & 1 & {- 1}\end{bmatrix}}^{\top}} & \left\lbrack {{Equation}\mspace{14mu} 56} \right\rbrack\end{matrix}$

An embodiment of the inventive concept may see that T₁T₁ ^(T)+T₂T₂^(T)=I; so the Haar wavelet frame is tight. In the corresponding tightframe U-Net structure, as illustrated in (c) of FIG. 4, in contrast tothe U-Net structure in (a) of FIG. 4, there is an additional high-passbranch. As shown in (c) of FIG. 4, similar to the U-Net in (a) of FIG.4, in the tight frame U-Net, each subband signal is by-passed to theindividual concatenation layers. The convolutional layer after theconcatenation layers may provide weighted sum of which weights arelearned from data. This simple fix makes the frame tight.

In other words, the neural network of the tight frame U-Net structureshown in (c) of FIG. 4 may be expressed as the neural network which hasthe tight filter-bank or the wavelet as the non-local basis to satisfythe convolutional frame. Herein, the non-local basis of the tight frameU-Net may satisfy the tight frame.

The nonlinear operation may restrict the sparsity of the signal forvarious input signals f or may restrict the positivity of the signal.This may enable the neural network to learn various input signals or thetransform signal. This may enable local and non-local basis vectors ofthe linear transform operation to find various solutions. Moreover, thenonlinear operation may construct the nonlinear operation in the form ofsatisfying the reconstruction condition.

The residual learning is applicable to the neural network to enhance thelearning effect. The residual learning may make the local basis of thelinear transform lower rank, so the unnecessary load of the neuralnetwork may be greatly reduced. Such an internal by-pass connection oran external by-pass connection may overcome the difficulty of the deepnetwork training to improve the performance of removing the local noiseand the non-local noise.

FIGS. 5A to 5C are drawings illustrating a standard U-Net architecture(5A), a dual frame U-Net architecture (5B), and a tight frame U-Netarchitecture (5C).

As shown in FIGS. 5A to 5C, each network may include a convolution layerfor performing the linear transform operation, a batch normalizationlayer for performing the normalization operation, a rectified linearunit (ReLU) layer for performing the nonlinear function operation, and apath connection with concatenation. Specifically, each stage may includefour sequential layers composed of convolution with 3×3 kernels, batchnormalization, and ReLU layers. The last stage may include twosequential layers and the last layer. The last layer may include onlythe convolution layer with 1×1 kernel. The number of channels for eachconvolution layer is illustrated in FIGS. 5A to 5C. The number ofchannels may be doubled after each pooling layers. The differencesbetween the U-Net and the dual frame U-Net or the tight frame U-Net arefrom the pooling and unpooling layers.

As shown in FIG. 5A, because the pooling and unpooling layers includedin the U-Net structure do not satisfy the frame condition, the standardU-Net may show the limitation of the reconstruction level from thesignal point of view. An embodiment of the inventive concept maymathematically prove the limit of the existing U-Net structure and mayformulate the theory capable of overcoming the limit based on it, thusproviding the dual frame U-Net and the tight frame U-Net which are theneural network architecture satisfying the frame condition.

As shown in FIG. 5B, the dual frame U-Net may express the mathematicalexpression of the U-Net as the dual frame to be the structured neuralnetwork architecture and may be the neural network architecture proposedto have the similar amount of computation to the U-Net structure andsatisfy the frame condition by adding the residual path concurrentlywith maintaining the general U-Net structure.

As shown in FIG. 5C, the tight frame U-Net may decompose thelow-frequency domain and the high-frequency domain using the wavelet.The low-frequency domain may be decomposed stage by stage to be the sameas the operation performed in the general U-Net structure, while thehigh-frequency domain may be reconstructed without losing thehigh-frequency signal, by designing the path to pass to the oppositelayer.

(a) to (c) of FIG. 6 are drawings illustrating an example ofreconstruction results by general, dual frame, and tight frame U-Nets atvarious sparse view reconstruction. The left box in each image regionillustrates the enlarged images, and the right box illustrates thedifference images. The number written to the images is the normalizedmean square error (NMSE) value.

As shown in the enlarged images and the difference images shown in (a)to (c) of FIG. 6, the U-Net produces blurred edge images in many areas,while the dual frame and tight frame U-Nets may enhance the highfrequency characteristics of the images. In other words, the dual frameU-Net and the tight frame U-Net according to an embodiment of theinventive concept may reduce the phenomenon in which the images arecrushed, which is the limit of the general U-Net, and may reconstructall sparse-view images using the single neural network without thecorrection of additional parameters by simultaneously learning varioussparse-view images.

FIG. 7 is a block diagram illustrating a configuration of an imageprocessing device according to an embodiment of the inventive conceptand illustrates a configuration of a device which performs the methodsin FIGS. 3 to 6.

Referring to FIG. 7, an image processing device 700 according to anembodiment of the inventive concept may include a reception unit 710 anda reconstruction unit 720.

The reception unit 710 may receive sparse-view CT data.

The reconstruction unit 720 may reconstruct an image for the sparse-viewCT data using a neural network for a learning model which satisfies apredetermined frame condition and is based on the convolutionalframelets.

Herein, the reconstruction unit 720 may reconstruct the image for thesparse-view CT data using the neural network of the learning model whichsatisfies the frame condition and is learned by the residual learning.

The neural network used in the device according to an embodiment of theinventive concept may include a neural network which generates alearning model satisfying the frame condition through a mathematicalanalysis based on the convolutional framelets and is learned by thelearning model and may include a multi-resolution neural networkincluding pooling and unpooling layers.

Herein, the multi-resolution neural network may include a structuredtight frame neural network by decomposing a structured dual frame neuralnetwork and the multi-resolution neural network into a low-frequencydomain and a high-frequency domain using wavelets by expressing themathematical expression of the multi-resolution neural network as thedual frame.

In addition, the neural network may include a by-pass connection fromthe pooling layer to the unpooling layer.

It is apparent to those skilled in the art that, although thedescription is omitted in the image processing device 700 of FIG. 7, therespective components configuring FIG. 7 may include all detailsdescribed in FIGS. 1 to 6.

The foregoing devices may be realized by hardware elements, softwareelements and/or combinations thereof. For example, the devices andcomponents illustrated in the exemplary embodiments of the inventiveconcept may be implemented in one or more general-use computers orspecial-purpose computers, such as a processor, a controller, anarithmetic logic unit (ALU), a digital signal processor, amicrocomputer, a field programmable array (FPA), a programmable logicunit (PLU), a microprocessor or any device which may executeinstructions and respond. A processing unit may implement an operatingsystem (OS) or one or software applications running on the OS. Further,the processing unit may access, store, manipulate, process and generatedata in response to execution of software. It will be understood bythose skilled in the art that although a single processing unit may beillustrated for convenience of understanding, the processing unit mayinclude a plurality of processing elements and/or a plurality of typesof processing elements. For example, the processing unit may include aplurality of processors or one processor and one controller. Also, theprocessing unit may have a different processing configuration, such as aparallel processor.

Software may include computer programs, codes, instructions or one ormore combinations thereof and may configure a processing unit to operatein a desired manner or may independently or collectively control theprocessing unit. Software and/or data may be permanently or temporarilyembodied in any type of machine, components, physical equipment, virtualequipment, computer storage media or units or transmitted signal wavesso as to be interpreted by the processing unit or to provideinstructions or data to the processing unit. Software may be dispersedthroughout computer systems connected via networks and may be stored orexecuted in a dispersion manner. Software and data may be recorded inone or more computer-readable storage media.

The methods according to the above-described exemplary embodiments ofthe inventive concept may be implemented with program instructions whichmay be executed through various computer means and may be recorded incomputer-readable media. The media may also include, alone or incombination with the program instructions, data files, data structures,and the like. The program instructions recorded in the media may bedesigned and configured specially for the exemplary embodiments of theinventive concept or be known and available to those skilled in computersoftware. Computer-readable media include magnetic media such as harddisks, floppy disks, and magnetic tape; optical media such as compactdisc-read only memory (CD-ROM) disks and digital versatile discs (DVDs);magneto-optical media such as floptical disks; and hardware devices thatare specially configured to store and perform program instructions, suchas read-only memory (ROM), random access memory (RAM), flash memory, andthe like. Program instructions include both machine codes, such asproduced by a compiler, and higher level codes that may be executed bythe computer using an interpreter. The described hardware devices may beconfigured to act as one or more software modules to perform theoperations of the above-described exemplary embodiments of the inventiveconcept, or vice versa.

According to embodiments of the inventive concept, the image processingdevice may reconstruct a sparse-view CT image as a high-quality imageusing the neural network for the learning model satisfying thepredetermined frame condition.

According to embodiments of the inventive concept, the image processingdevice may reconstruct a high-quality image while having the similaramount of calculation to the existing neural network architecture bymathematically proving the limit of the existing multi-resolution neuralnetwork, for example, the U-Net structure, formulating the theorycapable of overcoming the limit based on it, providing the neuralnetwork satisfying the frame condition, and reconstructing thesparse-view CT image by means of the neural network.

While a few exemplary embodiments have been shown and described withreference to the accompanying drawings, it will be apparent to thoseskilled in the art that various modifications and variations can be madefrom the foregoing descriptions. For example, adequate effects may beachieved even if the foregoing processes and methods are carried out indifferent order than described above, and/or the aforementionedelements, such as systems, structures, devices, or circuits, arecombined or coupled in different forms and modes than as described aboveor be substituted or switched with other components or equivalents.

Therefore, other implements, other embodiments, and equivalents toclaims are within the scope of the following claims.

What is claimed is:
 1. An image processing method, comprising: receivinga sparse-view computed tomography (CT) data; and reconstructing an imagefor the sparse-view CT data using a neural network of a learning modelsatisfying a predetermined frame condition, wherein the neural networkcomprises: a multi-resolution neural network including pooling andunpooling layers; and a structured tight frame neural network bydecomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame {tildeover (Φ)}, wherein the predetermined frame condition is represented bythe equation{tilde over (Φ)}Φ^(T) =I.
 2. The image processing method of claim 1,wherein the reconstructing of the image comprises: reconstructing theimage for the sparse-view CT data using the neural network of thelearning model which satisfies the frame condition and is learned byresidual learning.
 3. The image processing method of claim 1, whereinthe neural network comprises: a neural network which generates thelearning model satisfying the frame condition through a mathematicalanalysis based on convolutional framelets and is learned by the learningmodel.
 4. The image processing method of claim 1, wherein the neuralnetwork comprises: a by-pass connection from the pooling layer to theunpooling layer.
 5. An image processing method, comprising: receiving asparse-view CT data; and reconstructing an image for the sparse-view CTdata using a neural network for a learning model which satisfies apredetermined frame condition and is based on convolutional framelets,wherein the neural network comprises: a multi-resolution neural networkincluding pooling and unpooling layers; and a structured tight frameneural network by decomposing a structured dual frame neural network andthe multi-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame {tildeover (Φ)}, wherein the predetermined frame condition is represented bythe equation{tilde over (Φ)}Φ^(T) =I.
 6. An image processing device, comprising: areception unit configured to receive a sparse-view CT data; and areconstruction unit configured to reconstruct an image for thesparse-view CT data using a neural network of a learning modelsatisfying a predetermined frame condition, wherein the neural networkcomprises: a multi-resolution neural network including pooling andunpooling layers; and a structured tight frame neural network bydecomposing a structured dual frame neural network and themulti-resolution neural network into a low-frequency domain and ahigh-frequency domain using wavelets by expressing a mathematicalexpression of the multi-resolution neural network as a dual frame {tildeover (Φ)}, wherein the predetermined frame condition is represented bythe equation{tilde over (Φ)}Φ^(T) =I.
 7. The image processing device of claim 6,wherein the reconstruction unit is configured to: reconstruct the imagefor the sparse-view CT data using the neural network of the learningmodel which satisfies the frame condition and is learned by residuallearning.
 8. The image processing device of claim 6, wherein the neuralnetwork comprises: a neural network which generates the learning modelsatisfying the frame condition through a mathematical analysis based onconvolutional framelets and is learned by the learning model.
 9. Theimage processing device of claim 6, wherein the neural networkcomprises: a by-pass connection from the pooling layer to the unpoolinglayer.